Skip to content

CyberKatsu/DepWatch

Repository files navigation

DepWatch 🛡️

Dependency vulnerability scanner powered by OSV.dev and your choice of LLM.

Upload a requirements.txt or package.json. DepWatch queries every pinned dependency against the OSV.dev vulnerability database, then sends the raw findings to an AI provider — Anthropic Claude or Alibaba Qwen — which explains each CVE in plain English, constructs a realistic exploit scenario, and recommends a specific remediation. Results are ranked by urgency and persisted to Postgres so you can track a project's vulnerability profile over time.


Table of Contents

  1. Features
  2. Screenshots
  3. How It Works
  4. Tech Stack
  5. Project Structure
  6. Supported File Formats
  7. AI Providers
  8. OSV.dev — Data Source
  9. Prerequisites
  10. Quick Start — Docker Compose
  11. Local Development — Without Docker
  12. Environment Variables Reference
  13. API Reference
  14. Running Tests
  15. CI / GitHub Actions
  16. Database & Migrations
  17. Design Decisions
  18. Troubleshooting
  19. Contributing
  20. Licence

Features

Feature Detail
File upload requirements.txt (Python/pip) and package.json (Node/npm)
Batch OSV queries All dependencies checked in a single HTTP call — no API key required
AI enrichment Plain-English CVE explanation, exploit scenario, and version-specific fix
Dual AI provider Switch between Anthropic Claude and Alibaba Qwen with one env var
Urgency ranking Every CVE classified as Immediate, Soon, or Low Priority
Severity scoring CVSS scores extracted from OSV data; Critical / High / Medium / Low labels
Summary dashboard At-a-glance counts: total deps, vulnerable packages, breakdown by urgency and severity
Sortable table Vulnerabilities sorted by urgency then CVSS score; each row expands to show full detail
Scan history Every scan persisted to Postgres; full reports accessible at any time
Graceful degradation If the AI provider is unreachable, raw OSV data is returned with auto-generated stubs — the scan never silently fails
Docker Compose One command boots Postgres + FastAPI backend + Reflex frontend
GitHub Actions CI pytest + Ruff on every push; runs fully offline (SQLite, no network)

Screenshots

Upload screen Scan report History

How It Works

┌──────────────────────────────────────────────────────────────────────────────┐
│  Browser  ·  Reflex UI  (port 3000)                                          │
│  Pure Python → compiles to Vite + React at build time                        │
└─────────────────────────────────┬────────────────────────────────────────────┘
                                  │  HTTP POST multipart/form-data
                                  ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│  FastAPI backend  (port 8000)                                                │
│                                                                              │
│  POST /api/v1/scan/                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐     │
│  │ 1. Receive UploadFile (≤ 5 MB)                                      │     │
│  │ 2. Detect type from filename (.txt → requirements, .json → npm)     │     │
│  │ 3. Parse → extract pinned (==) dependencies only                    │     │
│  │ 4. POST /v1/querybatch → OSV.dev ──────────────────────────────────────► │
│  │ 5. Flatten CVEs → batch prompt → AI provider ◄──────────────────────── │
│  │    (Anthropic claude-sonnet-4-20250514  OR  Qwen qwen-plus)         │     │
│  │ 6. Persist Scan + Vulnerability rows to Postgres                    │     │
│  │ 7. Return ScanResponse JSON                                         │     │
│  └─────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  GET /api/v1/history/          paginated scan list                           │
│  GET /api/v1/history/{id}      full report for a past scan                  │
│  GET /health                   liveness probe                                │
└──────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
                      ┌───────────────────────┐
                      │  PostgreSQL 17         │
                      │  scans                 │
                      │  vulnerabilities       │
                      └───────────────────────┘

Request lifecycle in detail

  1. Parseapp/parsers/requirements.py or app/parsers/package_json.py extracts (name, version, ecosystem) tuples. Only exact-version pins (== / bare semver) are retained; ranges, VCS URLs, and wildcards are skipped and returned to the caller as skipped_lines.

  2. OSV queryapp/services/osv_client.py builds a single POST /v1/querybatch payload. OSV returns vulnerability lists in the same order as the query, so order is always preserved. Chunks of 100 are used as a safety margin below OSV's 1 000-item batch limit.

  3. AI enrichmentapp/services/ai_enrichment.py flattens all (dep, [OsvVulnerability]) pairs into a JSON array and sends it to the configured provider in a single call (sub-batched at 50 CVEs to respect context limits). The system prompt — defined as a constant in app/services/prompts.py — instructs the model to return a strict JSON array matching the VulnerabilityResult schema. Three fallback levels handle malformed responses: direct parse → embedded-array extraction → stub generation.

  4. Persist — A Scan row and one Vulnerability row per CVE are written inside the same get_db session. The commit is handled by the get_db dependency, not the route handler.

  5. Respond — The route returns a ScanResponse with the full vulnerability list, summary statistics, and scan metadata.


Tech Stack

Layer Library Version Notes
Frontend Reflex 0.8.27 Pure Python → Vite + React
Backend framework FastAPI 0.115.6 Async, OpenAPI auto-docs
ASGI server Uvicorn 0.32.1 With standard extras (watchfiles, httptools)
File upload python-multipart 0.0.20 FastAPI UploadFile dependency
HTTP client httpx 0.28.1 Async; used for OSV + Reflex→backend calls
Data validation Pydantic v2 2.10.4 Models for OSV responses and API shapes
Settings pydantic-settings 2.7.0 .env loading with type coercion
ORM SQLAlchemy 2.0.36 Async 2.0 style
DB driver asyncpg 0.30.0 Async PostgreSQL driver
Migrations Alembic 1.14.0 Async-aware env.py
Database PostgreSQL 17 (Alpine) UUID PKs, timezone-aware timestamps
AI — Anthropic anthropic 0.43.0 AsyncAnthropic client
AI — Qwen openai 1.58.3 OpenAI SDK pointed at Dashscope
Testing pytest 8.3.4 + pytest-asyncio 0.24.0, pytest-httpx 0.35.0
Linting Ruff 0.9.1 Lint + format check in CI
Containerisation Docker Compose v2 Three services: db, backend, frontend
CI GitHub Actions Ubuntu latest, Python 3.12

Project Structure

depwatch/
│
├── .github/
│   └── workflows/
│       └── ci.yml                    # pytest + ruff on push / PR to main
│
├── backend/
│   ├── app/
│   │   ├── __init__.py
│   │   ├── config.py                 # pydantic-settings: all env vars + provider helpers
│   │   ├── database.py               # async engine, AsyncSessionLocal, Base, get_db
│   │   ├── main.py                   # FastAPI app factory: lifespan, CORS, router mounts
│   │   │
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   └── scan.py               # Scan + Vulnerability SQLAlchemy ORM models
│   │   │
│   │   ├── parsers/
│   │   │   ├── __init__.py
│   │   │   ├── requirements.py       # requirements.txt → DependencyItem list
│   │   │   └── package_json.py       # package.json → DependencyItem list
│   │   │
│   │   ├── routers/
│   │   │   ├── __init__.py
│   │   │   ├── scan.py               # POST /api/v1/scan/ — full scan pipeline
│   │   │   └── history.py            # GET /api/v1/history/ + GET /api/v1/history/{id}
│   │   │
│   │   ├── schemas/
│   │   │   ├── __init__.py
│   │   │   ├── osv.py                # Pydantic models mirroring OSV.dev API response
│   │   │   └── scan.py               # API-layer request/response schemas
│   │   │
│   │   └── services/
│   │       ├── __init__.py
│   │       ├── prompts.py            # VULNERABILITY_ENRICHMENT_SYSTEM_PROMPT constant
│   │       ├── osv_client.py         # async httpx OSV.dev batch client
│   │       └── ai_enrichment.py      # BaseEnrichmentService + Anthropic/Qwen subclasses
│   │
│   ├── alembic/
│   │   ├── env.py                    # async-aware Alembic environment
│   │   └── versions/
│   │       └── 0001_initial.py       # scans + vulnerabilities tables + indexes
│   │
│   ├── tests/
│   │   ├── conftest.py               # in-memory SQLite fixtures + httpx ASGI client
│   │   ├── test_parsers.py           # 13 parser unit tests
│   │   ├── test_osv_client.py        # 7 OSV client tests (httpx mocked)
│   │   ├── test_ai_enrichment.py     # 24 enrichment tests (both providers + factory)
│   │   └── test_routes.py            # 12 route integration tests
│   │
│   ├── alembic.ini
│   ├── Dockerfile                    # multi-stage build, non-root runtime user
│   ├── pytest.ini                    # asyncio_mode = auto
│   └── requirements.txt
│
├── frontend/
│   ├── depwatch/
│   │   ├── __init__.py
│   │   └── depwatch.py               # complete Reflex app: AppState + all pages
│   ├── Dockerfile                    # Python 3.12 + Node 20 LTS
│   ├── requirements.txt
│   └── rxconfig.py                   # Reflex port config (frontend 3000, backend 3001)
│
├── .env.example                      # all variables documented with defaults
├── .gitignore
├── docker-compose.yml                # db + backend + frontend with health checks
└── README.md

Supported File Formats

requirements.txt — Python / pip

DepWatch extracts only pinned dependencies (the == operator). This is an intentional constraint: OSV.dev's /querybatch API requires an exact version string. Range specifiers like >=1.0,<2.0 cannot be resolved to a single version without running pip install; scanning them would produce unreliable results.

What is scanned:

flask==3.0.0
requests==2.31.0
uvicorn[standard]==0.32.1        # extras ([...]) are stripped — name becomes "uvicorn"
Django==4.2.0 ; python_version>="3.10"  # environment markers stripped

What is skipped (returned in the API response as skipped_lines for transparency):

flask>=3.0.0                     # range constraint
requests~=2.28                   # compatible release
sqlalchemy                       # unpinned — no version at all
git+https://github.com/org/pkg   # VCS URL
-r other-requirements.txt        # include directive
--index-url https://pypi.org/    # option flag
https://example.com/pkg.whl      # direct URL

package.json — Node.js / npm

DepWatch reads dependencies, devDependencies, and peerDependencies. Duplicate packages (appearing in more than one group) are de-duplicated, with the first occurrence winning.

What is scanned (exact semver only):

{
  "dependencies": {
    "express": "4.18.2",
    "semver":  "=7.5.4"
  }
}

What is skipped:

{
  "dependencies": {
    "lodash":    "^4.17.21",
    "axios":     "~1.6.0",
    "react":     ">=18.0.0",
    "my-lib":    "file:../my-lib",
    "workspace": "workspace:*",
    "from-git":  "github:user/repo"
  }
}

Tip: To maximise scan coverage, pin all your production dependencies. Run pip freeze > requirements.txt or npm shrinkwrap to generate a fully-pinned lockfile suitable for DepWatch.


AI Providers

DepWatch supports two interchangeable AI providers, selected by a single environment variable. Both receive the same system prompt and return the same structured JSON — switching providers requires no code changes.

Anthropic Claude (default)

AI_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
  • Default model: claude-sonnet-4-20250514
  • Override: ANTHROPIC_MODEL=claude-opus-4-5 (or any available model)
  • SDK: anthropic Python SDK, AsyncAnthropic client
  • Get a key: https://console.anthropic.com

Claude is called via messages.create() with a system parameter containing the structured enrichment prompt. The response is the first content[0].text block.

Alibaba Qwen via Dashscope

AI_PROVIDER=qwen
QWEN_API_KEY=sk-...
  • Default model: qwen-plus — balanced capability and cost for structured JSON
  • Override: QWEN_MODEL=qwen-max (highest capability) or qwen-turbo (fastest / cheapest)
  • SDK: openai Python SDK, AsyncOpenAI client pointed at the Dashscope endpoint
  • Endpoint: https://dashscope.aliyuncs.com/compatible-mode/v1 (OpenAI-compatible)
  • Get a key: https://dashscope.aliyuncs.com

Because Dashscope's API is fully OpenAI-compatible, no Qwen-specific SDK is required. The openai SDK is used with a custom base_url. Qwen calls include response_format={"type": "json_object"} to encourage strict JSON output (supported by qwen-plus and qwen-max; remove this if using qwen-turbo).

Provider comparison

Anthropic Claude Sonnet Qwen Plus Qwen Max
JSON reliability Excellent Very good Excellent
Context window 200k tokens 131k tokens 32k tokens
Speed Fast Fast Moderate
Cost $$ $ $$
Best for Production use Cost-sensitive / China region Maximum accuracy

How provider selection works

AI_PROVIDER=anthropic  →  AnthropicEnrichmentService  (anthropic SDK)
AI_PROVIDER=qwen       →  QwenEnrichmentService        (openai SDK + Dashscope)
(missing key)          →  StubEnrichmentService         (OSV data + auto labels)

The factory function get_enrichment_service() in app/services/ai_enrichment.py returns a cached singleton. All three classes extend BaseEnrichmentService, which owns the JSON-parsing pipeline (direct parse → embedded-array extraction → stubs). Subclasses implement only _call_api(user_message) -> str.

Graceful degradation

If the selected provider's API key is absent or the API call fails at runtime, BaseEnrichmentService catches the exception and falls back to stub objects. Stubs are auto-populated from raw OSV data:

  • plain_english_explanation ← OSV summary field
  • severity ← derived from CVSS score
  • urgency ← derived from severity label
  • remediation"Upgrade to the latest patched version."

The scan response is returned successfully — the user sees real CVE identifiers and CVSS scores even without AI enrichment.


OSV.dev — Data Source

OSV.dev (Open Source Vulnerabilities) is a free, open vulnerability database maintained by Google. It aggregates advisories from:

Source Ecosystems
GitHub Security Advisories (GHSA) All GitHub-hosted packages
CVE Programme (NVD) Cross-ecosystem CVEs
PyPI Advisory Database Python / pip
npm Advisory Database JavaScript / npm
RustSec Rust / Cargo
Go Vulnerability Database Go modules
OSS-Fuzz C, C++ and more
…and many more Maven, NuGet, Hex, Pub, etc.

How DepWatch uses OSV

DepWatch calls POST https://api.osv.dev/v1/querybatch with all pinned dependencies in a single request body:

{
  "queries": [
    { "package": { "name": "flask",    "ecosystem": "PyPI" }, "version": "1.0.0" },
    { "package": { "name": "requests", "ecosystem": "PyPI" }, "version": "2.25.0" }
  ]
}

OSV returns results in the same order as the queries. Each result is a list of OsvVulnerability objects, which may include:

  • id — OSV identifier (e.g. GHSA-xxxx-yyyy-zzzz) or CVE-YYYY-NNNNN
  • aliases — cross-references including CVE IDs when the primary ID is a GHSA
  • severity — CVSS vector string (e.g. CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)
  • summary — one-line description
  • details — full advisory text

DepWatch extracts a numeric CVSS base score from the vector string using a three-level fallback: direct float parse → trailing number in vector → qualitative label mapping (HIGH → 7.5, CRITICAL → 9.5, etc.).

No API key is required. OSV.dev is fully free and open.


Prerequisites

Requirement Minimum version Notes
Git any
Docker Desktop 4.x Includes Docker Compose v2
Python 3.12 Only needed for local dev without Docker
Node.js 20 LTS Only needed for local dev without Docker (Reflex build pipeline)
Anthropic API key Required if AI_PROVIDER=anthropic. Free tier available.
Qwen / Dashscope key Required if AI_PROVIDER=qwen. Free tier available.

Quick Start — Docker Compose

This is the recommended way to run DepWatch. All three services (Postgres, backend, frontend) start with a single command.

# 1. Clone the repository
git clone https://github.com/yourname/depwatch.git
cd depwatch

# 2. Create your .env file from the template
cp .env.example .env

Open .env and set your AI provider:

# Option A — Anthropic Claude (default)
AI_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-your-key-here

# Option B — Alibaba Qwen
# AI_PROVIDER=qwen
# QWEN_API_KEY=sk-your-dashscope-key-here
# 3. Build images and start all services
docker compose up --build

# To run in the background:
docker compose up --build -d

What happens on first boot:

Step Duration Detail
Postgres starts ~3 s Health check waits for pg_isready
Backend starts ~5 s FastAPI runs Base.metadata.create_all() on boot
Reflex compiles ~60 s Vite bundle is compiled once; cached in the reflex_web Docker volume on subsequent starts

Once running:

URL What it is
http://localhost:3000 DepWatch web UI
http://localhost:8000/docs FastAPI interactive API docs (Swagger UI)
http://localhost:8000/redoc FastAPI API docs (ReDoc)
http://localhost:8000/health Liveness probe

Useful Compose commands:

# View logs for a specific service
docker compose logs -f backend
docker compose logs -f frontend

# Stop all services (preserves data volumes)
docker compose down

# Stop and remove all data (full reset)
docker compose down -v

# Rebuild a single service after a code change
docker compose up --build backend

Local Development — Without Docker

Use this approach when you want faster iteration cycles or need to attach a debugger.

Step 1 — Start Postgres

You still need Postgres running. The simplest way is to start just the DB container:

docker compose up db -d
# Postgres is now available on localhost:5432
# User: depwatch  Password: depwatch  Database: depwatch

Or use an existing local Postgres instance and update DATABASE_URL in .env accordingly.

Step 2 — Backend

cd backend

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# Install all dependencies (includes both anthropic and openai SDKs)
pip install -r requirements.txt

# Install the async SQLite driver used only in tests
pip install aiosqlite

# Copy and configure environment
cp ../.env.example .env
# Edit .env: set AI_PROVIDER and the corresponding API key

# Run database migrations
alembic upgrade head

# Start the development server with hot reload
uvicorn app.main:app --reload --port 8000

The API is now running at http://localhost:8000. Interactive docs at http://localhost:8000/docs.

Step 3 — Frontend

Open a second terminal:

cd frontend

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

# First-time initialisation (creates the .web/ directory and installs npm deps)
reflex init

# Start the development server with hot reload
reflex run

The UI is now running at http://localhost:3000. Reflex hot-reloads on Python file saves.

Step 4 — Verify the stack

# Check the backend health endpoint
curl http://localhost:8000/health
# → {"status": "ok", "version": "1.0.0"}

# Test a scan via curl (replace with your own requirements.txt)
curl -X POST http://localhost:8000/api/v1/scan/ \
  -F "file=@/path/to/requirements.txt"

Environment Variables Reference

All variables can be set in the .env file at the project root. Docker Compose reads them automatically. For local development without Docker, place .env in the backend/ directory.

AI Provider

Variable Default Required Description
AI_PROVIDER anthropic No Active AI provider. Options: anthropic, qwen

Anthropic

Variable Default Required Description
ANTHROPIC_API_KEY "" If AI_PROVIDER=anthropic Your Anthropic API key. Get one at https://console.anthropic.com
ANTHROPIC_MODEL claude-sonnet-4-20250514 No Claude model to use for enrichment

Qwen

Variable Default Required Description
QWEN_API_KEY "" If AI_PROVIDER=qwen Your Dashscope API key. Get one at https://dashscope.aliyuncs.com
QWEN_MODEL qwen-plus No Qwen model. Options: qwen-turbo, qwen-plus, qwen-max, qwen-long
QWEN_BASE_URL https://dashscope.aliyuncs.com/compatible-mode/v1 No Dashscope OpenAI-compatible endpoint URL

Database

Variable Default Required Description
DATABASE_URL postgresql+asyncpg://depwatch:depwatch@localhost:5432/depwatch Yes SQLAlchemy async DSN. Use postgresql+asyncpg:// scheme

Note: docker-compose.yml overrides DATABASE_URL with the internal Docker network hostname db. You only need to set this manually for local dev without Docker.

Application

Variable Default Required Description
ENVIRONMENT development No Set to production to suppress SQL echo logging
LOG_LEVEL INFO No Python logging level: DEBUG, INFO, WARNING, ERROR

Full .env example

# .env — copy from .env.example and fill in your values

AI_PROVIDER=anthropic

ANTHROPIC_API_KEY=sk-ant-api03-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# QWEN_API_KEY=sk-...
# QWEN_MODEL=qwen-plus
# QWEN_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1

DATABASE_URL=postgresql+asyncpg://depwatch:depwatch@localhost:5432/depwatch

ENVIRONMENT=development
LOG_LEVEL=INFO

API Reference

The FastAPI backend exposes an OpenAPI spec at http://localhost:8000/docs (Swagger UI) and http://localhost:8000/redoc. All endpoints are prefixed with /api/v1 except /health.


POST /api/v1/scan/

Upload a dependency file and receive a full vulnerability report.

Request

Content-Type: multipart/form-data
Field Type Description
file UploadFile A requirements.txt or package.json. Maximum 5 MB.

Response 201 Created

{
  "scan_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "filename": "requirements.txt",
  "file_type": "requirements",        // "requirements" | "package_json"
  "created_at": "2025-03-21T10:30:00Z",
  "summary": {
    "total_dependencies": 24,
    "vulnerable_count": 6,
    "immediate_count": 3,
    "soon_count": 2,
    "low_priority_count": 1,
    "critical_count": 1,
    "high_count": 2,
    "medium_count": 2,
    "low_count": 1
  },
  "vulnerabilities": [
    {
      "id": "7d8e9f10-...",
      "package_name": "flask",
      "package_version": "1.0.0",
      "cve_id": "CVE-2023-30861",
      "cvss_score": 7.5,
      "severity": "High",
      "plain_english_explanation": "Flask before 2.3.2 mishandles the Vary header...",
      "exploit_scenario": "An attacker and victim share a reverse-proxy cache...",
      "remediation": "Upgrade to flask==2.3.3 or later.",
      "urgency": "Immediate"           // "Immediate" | "Soon" | "Low Priority"
    }
    // ...one object per CVE
  ]
}

Error responses

Status Condition
413 Request Entity Too Large File exceeds 5 MB
422 Unprocessable Entity Unsupported file type, invalid JSON, or no pinned dependencies found
502 Bad Gateway OSV.dev API returned an error or was unreachable

GET /api/v1/history/

Return a paginated list of past scans, newest first.

Query parameters

Parameter Type Default Description
limit integer (1–100) 20 Number of items to return
offset integer (≥ 0) 0 Items to skip (for pagination)

Response 200 OK

{
  "items": [
    {
      "scan_id": "3fa85f64-...",
      "filename": "requirements.txt",
      "file_type": "requirements",
      "total_dependencies": 24,
      "vulnerable_count": 6,
      "created_at": "2025-03-21T10:30:00Z"
    }
  ],
  "total": 42
}

GET /api/v1/history/{scan_id}

Return the full vulnerability report for a historical scan.

Path parameter

Parameter Type Description
scan_id UUID The scan ID returned by POST /scan/ or listed in GET /history/

Response 200 OK — same shape as POST /api/v1/scan/

Error responses

Status Condition
404 Not Found No scan with this ID exists

GET /health

Liveness probe for load balancers and Docker health checks.

Response 200 OK

{ "status": "ok", "version": "1.0.0" }

Running Tests

The full test suite runs completely offline — SQLite in-memory replaces Postgres and all external HTTP calls (OSV.dev, Anthropic, Qwen) are intercepted by mocks. No API keys are required to run tests.

Setup

cd backend

# If not already done:
pip install -r requirements.txt
pip install aiosqlite           # async SQLite driver for in-memory test DB

Running the suite

# Run everything
pytest

# Verbose output with test names
pytest -v

# Run a single test module
pytest tests/test_parsers.py -v
pytest tests/test_osv_client.py -v
pytest tests/test_ai_enrichment.py -v
pytest tests/test_routes.py -v

# Run tests matching a keyword
pytest -k "qwen" -v
pytest -k "history" -v

# Stop on first failure
pytest -x

# Show slowest 10 tests
pytest --durations=10

Test modules

tests/test_parsers.py — 13 tests

Pure unit tests; no I/O. Cover both parsers exhaustively.

Test What it verifies
test_single_pinned_dependency Basic == pin is extracted
test_multiple_pinned_dependencies Multiple pins in correct order
test_comments_are_ignored # lines don't appear in output
test_blank_lines_are_ignored Empty lines don't crash
test_unpinned_dependency_is_skipped requests alone goes to skipped_lines
test_range_constraint_is_skipped >= constraint goes to skipped_lines
test_extras_are_stripped_from_name uvicorn[standard] → name uvicorn
test_environment_markers_are_ignored ; python_version>=... stripped
test_vcs_url_is_skipped git+https://... goes to skipped_lines
test_flag_line_is_skipped -r other.txt goes to skipped_lines
test_empty_file_returns_empty_result No crash on empty input
test_pre_release_version 1.0.0b2 is a valid pinned version
test_mixed_content All types together, correct counts
test_exact_version_is_extracted (package.json) "4.18.2" → scanned
test_caret_range_is_skipped "^4.17.21" → skipped
test_dev_dependencies_are_included devDependencies entries scanned
test_both_dep_groups_merged dependencies + devDependencies combined
test_duplicate_package_deduped Package in both groups appears once
test_equals_prefix_stripped "=7.5.4" → version 7.5.4
test_invalid_json_sets_parse_error Malformed JSON sets parse_error
test_prerelease_semver_accepted "14.0.0-canary.1" is valid

tests/test_osv_client.py — 7 tests + 3 unit tests

test_query_batch_returns_pairs          Happy path, two deps one vuln each
test_query_batch_empty_input_returns_empty  No HTTP call made
test_query_batch_no_vulns               Empty vuln list for a safe package
test_query_batch_order_preserved        Output order matches input order exactly
test_osv_http_error_raises              HTTP 500 propagates as HTTPStatusError
TestExtractCvssScore::test_plain_float_score        "8.1" → 8.1
TestExtractCvssScore::test_no_severity_returns_none [] → None
TestExtractCvssScore::test_qualitative_high_maps_to_score  "HIGH" → 7.5
TestExtractCvssScore::test_trailing_number_in_vector       CVSS vector/9.8 → 9.8

Uses pytest-httpx to intercept all httpx calls at the transport layer.

tests/test_ai_enrichment.py — 24 tests

Structured into four classes:

TestSharedEnrichmentLogic — tests the parsing pipeline once, provider-agnostically:

test_clean_json_array_is_parsed
test_json_embedded_in_prose_is_extracted
test_malformed_response_returns_stubs
test_empty_osv_pairs_returns_empty_no_api_call
test_empty_string_response_returns_stubs
test_urgency_derived_when_missing_from_response
test_multiple_packages_all_returned
test_api_exception_falls_back_to_stubs

TestAnthropicProvider:

test_anthropic_happy_path
test_anthropic_no_key_returns_empty_string
test_anthropic_uses_configured_model

TestQwenProvider:

test_qwen_happy_path
test_qwen_no_key_returns_empty_string
test_qwen_uses_configured_model
test_qwen_system_prompt_passed_correctly
test_qwen_json_object_wrapper_handled
test_qwen_no_choices_returns_stubs

TestGetEnrichmentServiceFactory:

test_factory_returns_anthropic_service_by_default
test_factory_returns_qwen_service_when_configured
test_factory_is_cached
test_reset_clears_cache

tests/test_routes.py — 12 tests

Full HTTP round-trip tests using httpx.AsyncClient with ASGI transport. External calls (OSV, AI) are mocked at the service layer.

TestScanEndpoint::
  test_scan_requirements_returns_201
  test_scan_package_json_returns_201
  test_scan_unsupported_file_type_returns_422
  test_scan_invalid_json_returns_422
  test_scan_no_pinned_deps_returns_422
  test_scan_response_has_summary_fields
  test_scan_osv_failure_returns_502
  test_scan_persists_to_db               ← verifies actual DB row creation

TestHistoryEndpoint::
  test_history_returns_200
  test_history_pagination
  test_history_detail_not_found
  test_history_detail_returns_scan

TestHealthEndpoint::
  test_health_returns_ok

Test design rationale

Why SQLite instead of Postgres for tests? Running Postgres in CI requires a service container, adds ~30 seconds of startup overhead, and couples the test suite to infrastructure. SQLite (in-memory via aiosqlite) is structurally identical for every query DepWatch runs. Postgres-specific behaviour — UUIDs, timezone-aware timestamps, cascade deletes — is validated by the Alembic migration file and integration tests.

Why mock at the service layer, not the HTTP layer, for route tests? Mocking get_osv_client() and get_enrichment_service() return values is faster, clearer, and more stable than intercepting HTTP calls from inside a full request cycle.

Why reset_enrichment_service() in factory tests? get_enrichment_service() caches a singleton. Tests that need to control which provider is returned must clear the cache in setup_method and teardown_method to avoid cross-test pollution.


CI / GitHub Actions

The workflow at .github/workflows/ci.yml runs on every push to main or develop and on every pull request targeting main.

Jobs

test — runs on ubuntu-latest, Python 3.12:

  1. Check out repository
  2. Set up Python with pip cache keyed on requirements.txt
  3. pip install -r requirements.txt && pip install aiosqlite
  4. pytest tests/ -v --tb=short

No Postgres service container is required. DATABASE_URL is set to a dummy Postgres URL (never actually connected) and the test session uses the SQLite override from conftest.py.

lint — runs in parallel with test:

  1. pip install ruff==0.9.1
  2. ruff check app/ tests/
  3. ruff format --check app/ tests/

Badge

Add this to your fork's README:

[![CI](https://github.com/yourname/depwatch/actions/workflows/ci.yml/badge.svg)](https://github.com/yourname/depwatch/actions/workflows/ci.yml)

Database & Migrations

Schema

scans

Column Type Notes
id UUID PK uuid4() default
filename VARCHAR(255) Original upload filename
file_type VARCHAR(50) "requirements" or "package_json"
total_dependencies INTEGER Count of pinned deps parsed
vulnerable_count INTEGER Unique packages with ≥1 CVE
created_at TIMESTAMPTZ Server-side now() default

vulnerabilities

Column Type Notes
id UUID PK uuid4() default
scan_id UUID FK → scans.id ON DELETE CASCADE
package_name VARCHAR(255)
package_version VARCHAR(100)
cve_id VARCHAR(100) CVE-YYYY-NNNNN or GHSA ID
cvss_score FLOAT Nullable — not all OSV entries include a score
severity VARCHAR(50) Critical / High / Medium / Low / Unknown
plain_english_explanation TEXT AI-generated. Nullable
exploit_scenario TEXT AI-generated. Nullable
remediation TEXT AI-generated. Nullable
urgency VARCHAR(50) Immediate / Soon / Low Priority. Nullable

Indexes: ix_scans_created_at on scans.created_at; ix_vulnerabilities_scan_id on vulnerabilities.scan_id.

Running migrations

cd backend

# Apply all pending migrations
alembic upgrade head

# Check current revision
alembic current

# Generate a new migration from ORM model changes
alembic revision --autogenerate -m "add remediation_url column"

# Roll back one revision
alembic downgrade -1

Automatic table creation

On application startup, app/main.py calls Base.metadata.create_all() via the lifespan handler. This is idempotent — it creates tables that don't exist and does nothing for tables that do. It is not a replacement for Alembic migrations (which handle column additions, renames, and index changes), but it ensures the app works on first run from Docker Compose without a manual migration step.

For production deployments, run alembic upgrade head as an init container or as part of your deployment pipeline before starting the application.


Design Decisions

Single OSV batch call per scan

A single POST /v1/querybatch replaces N individual POST /v1/query calls. For a 30-dependency requirements.txt this cuts OSV round-trips from 30 to 1, reducing scan latency by ~95% and being considerate to OSV's free infrastructure. The client auto-chunks at 100 items to stay safely below OSV's documented 1 000-item limit.

Single AI call per scan (sub-batched at 50)

All CVEs from a scan are sent to the AI provider in one call rather than one call per CVE. For a file with 8 vulnerabilities, this is 8× cheaper and faster. Claude's 200k and Qwen's 131k context windows comfortably accommodate even large scans. At 50+ CVEs the client makes sequential sub-batch calls to stay within typical output token limits.

Provider abstraction via inheritance

BaseEnrichmentService owns all JSON parsing, validation, and stub logic. AnthropicEnrichmentService and QwenEnrichmentService implement only _call_api(user_message) -> str. Adding a third provider (e.g. Gemini, Mistral, Ollama) requires only a new subclass and a one-line addition to the factory function — no changes to the shared parsing pipeline.

Separate ORM and Pydantic schemas

SQLAlchemy models define the database schema; Pydantic schemas define the HTTP contract. A DB column rename doesn't break the API response shape. An API response field addition doesn't require a migration. The two layers can evolve at different paces.

UUID primary keys

Sequential integer IDs expose row counts (/history/5 implies there are at least 5 scans) and make enumeration attacks trivial. UUIDs are safe to expose in URLs, API responses, and logs.

expire_on_commit=False on the async session

FastAPI serialises the response object immediately after the route handler returns, inside the same get_db session scope. With SQLAlchemy's default (expire_on_commit=True), attributes are expired after commit(). Accessing them would trigger implicit lazy loads on an async session, causing MissingGreenlet errors. expire_on_commit=False keeps all loaded attributes available for the lifetime of the request.

SQLite for tests

Postgres-in-CI requires a service container, extends pipeline time, and tightly couples test infrastructure to the database engine. SQLite (in-memory, via aiosqlite) is structurally identical for every query DepWatch executes. Postgres-specific behaviour is verified by the Alembic migration and the Docker Compose integration environment.

get_db commits on success, rolls back on exception

The get_db dependency commits inside try and rolls back inside except. Route handlers call await db.flush() (not commit()) to write rows within the transaction, letting the dependency control the transaction boundary. This prevents partially-written scans if the serialisation step raises after the insert.

Prompts as constants

VULNERABILITY_ENRICHMENT_SYSTEM_PROMPT is a module-level string constant in services/prompts.py, not an f-string or a database record. This makes it reviewable in code review, diffable in git, and importable in tests. The template is provider-agnostic — it instructs the model in terms of input/output JSON schemas, not in provider-specific syntax.


Troubleshooting

The Reflex frontend takes a very long time to start

The first reflex run (or docker compose up --build) compiles the Vite bundle, which downloads npm packages and runs the build. This takes 60–120 seconds on a cold start. Subsequent starts use the cached .web/ directory (or the reflex_web Docker volume) and complete in ~5 seconds.

If it seems stuck, check the container logs:

docker compose logs -f frontend

Look for the line App running at: http://localhost:3000 — that's when compilation is complete.

ANTHROPIC_API_KEY is not set warning at startup

This is expected behaviour when the key is missing. The application will start and scans will succeed, but vulnerability reports will use stubs rather than AI-generated enrichment. To enable full enrichment, add your key to .env and restart the backend.

AI enrichment unavailable in scan results

This appears in the plain_english_explanation field when the AI call fails or no key is configured. The scan result is still valid — OSV data (CVE IDs, CVSS scores) is present and correct. Check:

  1. Is ANTHROPIC_API_KEY or QWEN_API_KEY set in .env?
  2. Does AI_PROVIDER match the key you've provided?
  3. Check backend logs: docker compose logs backend | grep "API error"

No pinned dependencies found error (422)

DepWatch only scans exact versions. If your file contains only range constraints:

# requirements.txt with ranges — will return 422
flask>=3.0.0
requests~=2.28

Generate a pinned file:

pip freeze > requirements-pinned.txt

Or for npm:

npm shrinkwrap      # generates npm-shrinkwrap.json (rename to package.json)

OSV returns no vulnerabilities for known-vulnerable packages

OSV.dev data is updated continuously but may lag new CVE publications by hours. Also verify:

  • The package name exactly matches the PyPI or npm registry name (case-insensitive for PyPI, exact-case for npm)
  • The version is genuinely affected — a patched version may correctly show no vulnerabilities

Database connection errors on startup

The backend waits for Postgres to pass its health check before starting (via the depends_on: condition: service_healthy setting in Compose). If you see connection errors, check:

docker compose ps
# Verify the `db` service shows "(healthy)"

docker compose logs db
# Look for "database system is ready to accept connections"

Port conflicts

Default ports: 3000 (frontend), 3001 (Reflex internal), 8000 (backend), 5432 (Postgres). To change them, edit docker-compose.yml and frontend/rxconfig.py.


Contributing

Contributions are welcome. Please follow this workflow:

# 1. Fork and clone
git clone https://github.com/yourname/depwatch.git
cd depwatch

# 2. Create a feature branch
git checkout -b feat/your-feature-name

# 3. Make changes and add tests

# 4. Verify locally
cd backend
pip install -r requirements.txt aiosqlite
pytest -v                        # all tests must pass
ruff check app/ tests/           # no lint errors
ruff format app/ tests/          # code must be formatted

# 5. Push and open a pull request
git push origin feat/your-feature-name

CI will run automatically on your PR. Merges to main require passing tests and lint.

Adding a new AI provider

  1. Create a subclass of BaseEnrichmentService in app/services/ai_enrichment.py:
    class GeminiEnrichmentService(BaseEnrichmentService):
        async def _call_api(self, user_message: str) -> str:
            # call Gemini API, return raw text response
            ...
  2. Add a new Literal option to ai_provider in app/config.py
  3. Add any provider-specific settings fields to Settings
  4. Register the new class in get_enrichment_service()
  5. Add tests in tests/test_ai_enrichment.py following the TestQwenProvider pattern
  6. Update .env.example and this README

Licence

MIT — see LICENSE for full text.


Built with FastAPI · Reflex · OSV.dev · Claude · Qwen

About

web application where developers upload a requirements.txt or package.json file and receive a comprehensive vulnerability report on their dependencies, enriched by Claude with plain-English explanations and prioritised remediation guidance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors